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1. Introduction 

Recent progress of lattice QCD simulations with dynamical flavors relies on the development 
of numerical algorithms and computational facilities. While the computational power has increased 
to a multi-Tera-flops level, we cannot yet simulate QCD at realistic quark masses. To overcome 
this status various numerical algorithms for dynamical lattice QCD have been proposed. 

The standard algorithm to simulate dynamical lattice QCD is the Hybrid Monte Carlo (HMC) 
algorithm Hence most of recent improvements on the lattice QCD algorithm aim to speed up 
the HMC algorithm. There are two key technologies in the literature; 

(1) Decouple UV and IR fermionic modes by preconditioning lattice Dirac operator, and modify 
the HMC Hamiltonian [§]. 

(2) Use the Sexton- Weingarten molecular dynamics (MD) integrator with multi (fictitious) time 
scales [Q], in which the IR modes of pseudo-fermions are assigned to the coarser time scales 
and the UV modes to the finer time scales. 

Various types of preconditioner have been proposed; the even/odd site preconditioner [Q, ||, g|, 
Hasenbusch's heavy mass preconditioner |Q, ^ |90, Liischer's even/odd domain decomposition 
(SAP) preconditioner [10], Polynomial preconditioner [O], n-th root multiple pseudo-fermion 



trick Q12 ], etc. These preconditioners combined with the Sexton- Weingarten MD integrator achieve 



a remarkable (a factor two to ten ) speed up over the naive HMC algorithm. 



In this article we investigate the UV-filtering preconditioner [ 13] for the 0(a) -improved Wilson 



Dirac [14] fermions. The UV-filtering preconditioner [13] has been proposed for the Multi-Boson 



(MB) algorithm [15, 16]. With this preconditioner the number of external multi -boson fields can 
be significantly reduced, leading to a sizable speed up of the MB algorithm [|13|]. We apply this 
UV-filtering preconditioner to the Polynomial Hybrid Monte Carlo (PHMC) algorithm [Q, [l7|]. In 
the next section we describe our UV-filtered Polynomial Hybrid Monte Carlo (UV-PHMC) algo- 
rithm. The numerical results are presented in Section |3[ where we investigate the efficiency of the 
algorithm for the plaquette gluon action and the 0(a) -improved Wilson quark action with j8 = 5.2, 
Nf = 2, c sw = 2.02, M K /M p ~ 0.8 and 0.7 on a 16 3 x 48 lattices. 

Using the PACS-CS computer [jl^] the PACS-CS collaboration is planning to further promote 
the Nf = 2 + 1 lattice QCD project that has been started by the CP-PACS/JLQCD joint collabora- 



tion [19]. The UV-filtered PHMC (UV-PHMC) algorithm will be applied to the single flavor part 



of the Nf ■ = 2 + 1 simulations. A status report of the PACS-CS collaboration is given in Ref. [2C]. 
2. Algorithm 

To describe the UV-PHMC algorithm we start with the lattice QCD partition function with the 
0(a) -improved Wilson fermion in the symmetrically even/odd-site preconditioned form [§, pl| |. 

3T = J &Udet[D[U]] N fe- SG P\- s *'W, (2.1) 

Sdv[U] = -NfTr[Log[T- 1 ]], (2.2) 
D = l ee - T ee M eo T 00 M oe = l ee - M ee , (2.3) 
T = (\+c sw ko-F)-\ (2.4) 
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where U denotes gauge links, So [U] is a gauge action, T is the local clover term with the clover 
leaf field strength F and the clover coefficient c sw . M eo (M oe ) is the single hopping matrix jumping 
from odd (even) sites to even (odd) sites, and D and M ee operate only on even sites. We apply the 
UV-filter preconditioner to D in Eq. (2.1). 

2.1 The UV-filter 

We introduce the UV-filter preconditioner P[U] as 

P[U]=exp[sM ee }, (2.5) 

where Y is a tunable parameter (UV-filter parameter). We can understand that this operator is a 
preconditioner by setting s = 1 as follows. 

Q[U] = P[U]D[U] = exp[M ee ](l -M ee ) = 1 - ^f- - ^f- (2.6) 

Since M ee is 0(k 2 ), P[U] removes 0(k 2 ) term and the preconditioned operator Q becomes 1 + 
0(k 4 ) close to identity matrix [13]. Using this preconditioner one can rewrite the quark determi- 
nant as 

det\D[U]] N f = detftPpV-ippMU]]"' 

= det[Q[U]ff e xp[- S N f Tr[M ee \] = det[Q[U}ff e xp[-N f S w[u] }, (2.7) 

where 

S m [U] =sTr[M ee ] =^ 2 £tr coloi , dirac [r( w )(l-7 M )^(«)r(« + At)(l + 7M)^(«)]- (2-8) 

n,n 

Note that 5 UV is still a local action and vanishes when c sw = 0. For the unimproved Wilson 
fermion further preconditioning, which removes 0(k 4 ) term, has been investigated in the MB 
algorithm [13]. In our improved case 0{k a ) preconditioning results in a complicated (non ultra- 
local) action for S uv and we do not investigate the C^K 4 ) preconditioner in this article. We employ 
Eq. Q for the UV-filter. 

2.2 Hybrid Monte Carlo algorithm with the UV-filter 

By applying the polynomial approximation Pn ly ~ (Q[U])~ l , and introducing a pseudo- 
fermion (j) and a fictitious momenta IT for gauge links, we obtain 



2? 




(2.9) 


H[U,<l>\(j>} 


= Tr[n 2 ] +S G [U] +S clv [U] +N f S uv [U]+S Q [U,<!>~ l ,<i>], 


(2.10) 




= \P Npoly [M ee ](j>\ 2 , 


(2.11) 


w 


= P Npoly [M ee ] (l-M ee ) exp [sMge] , 


(2.12) 










= £ c k {M ee )\ 
k=0 


(2.13) 
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where s and Ck are tuned to satisfy W ~ 1. The choice of s and will be described in the next 



subsection. When s = this action reduces to that for the normal PHMC algorithm. Eq. ( |2.13| ) 
applies to the Nf = 2 case. For the Nf = 1 case, we use the factorized polynomial instead of 
Eq. ( [2.13 ) as described in Ref. [21]. The actions So, S c iv> an d S w can be classified in the UV part, 



and Sq in the IR part. The effect of W is incorporated by the noisy-Metropolis test as having been 



used in the MB algorithm [J16p and CP-PACS/JLQCD's previous studies [J19p. We investigate the 
PHMC algorithm with Eq. 



The flow of the algorithm is almost the same as that described in Ref. [21] except for the 
following minor changes. 

• For the MD integrator we employ the Sexton- Weingarten MD integrator [^] with two time- 
step scales (UV and IR). The PUP-order integration scheme has been used in the literature. 
In this study we test the following UPU-order integrator; 

I OTq\ . I 8Zq \| 2 I / 8Zq \ c 7 8Zq 



""(2.14) 

where 8zq = z /No/Ni, 8zi = z/Ni, and z is trajectory length. U(8z) integrates gauge links 
by 8z, Px(8z) integrates gauge momenta by 8z. The UV-modes are integrated by F\jy and 
the IR-modes by Pir. A^o and N\ are the number of time-steps in each trajectory and A^o 
should be an even number in this scheme. 
• For the single flavor case we need to take the square root of the correction matrix W to do the 



noisy-Metropolis test. We tested a new algorithm of Ref. Q22[ ] for the matrix square root prob- 
lem. The algorithm utilizes the Krylov-subspace method via the Arnoldi factorization. Since 
the use of the Krylov-subspace method does not significantly affect the whole efficiency of 
the UV-PHMC algorithm, we will skip the details of the matrix square root algorithm in this 
article. 

• For the UV-filter we need to calculate the matrix exponential of M ee . We tested the following 
three methods; (i) the Taylor expansion approximation method, (ii) the Pade approximation 
method (without multi-shift solver), (iii) the Krylov-subspace approximation method 
We employ the Taylor expansion approximation method (i) because of its simplicity and 
moderate efficiency. The truncation error of the Taylor approximation is controlled by mon- 
itoring the spectrum norm of M ee . 

2.3 Choice of polynomial coefficients 

In order to minimize the cost of the algorithm, the UV-filter coefficient s and the polynomial 
coefficients Ck should be chosen to satisfy the condition that W ~ 1 with a small A'poiy- We investi- 
gated the following two coefficient schemes. 

(A) Taylor expansion method: By expanding [(1 — M ee )exp{sM ee ]]~ l with respect to M ee , we 
obtain 

^poly k (_ s )j 

[(l-M^exp^]]- 1 ~ £ Ci (M ee )*, with c* = £^^. (2.15) 

k=0 7=0 J ■ 

This is nothing but the hopping matrix expansion. 
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(B) Adopted polynomial method [ 13]: This method minimizes the following function; 

\ 2 



R(c,s) 



^poly 

£ Ck(M ee 

k=0 



;i -M ee )exp[sM ee ] - I t] 



|(W-1)T7| 



(2.16) 



where 17 is a Gaussian random noise vector and c = (co,ci,C2, . . . ,Cjv poly ). This method has 
been used in the MB algorithm as described in Ref. [[D]]. The function minimization is 
carried out by a linear fitting for c with a fixed s followed by Newton's method for s. We take 
several thermalized gauge configurations for the fitting. 



Figure[I] shows tne polynomial coefficients 
with the Adopted polynomial method (B). We 
observed that the dynamic range of the coeffi- 
cients spreads from 0(1O~ 2 ) to 0(1O 13 ). This 
means that careful treatment of the numerical 
accuracy and stability is required to compute 
the polynomial Pw poly within a finite precision 
arithmetic. Although we used double precision 
arithmetic and Clenshaw recurrence formula to 
construct the polynomial, we could not main- 
tain good accuracy and stability for Pn ]y . In 
the rest of paper, therefore, we employ the Tay- 



Adopted UV-filtered polynomial coefficients (config.#1) 
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lor expansion method (A) and Eq. ( J2. 15| ) for the 
coefficients. 

3. Numerical Results 



Figure 1: Adopted polynomial coefficients with s = 2 
and yVpdy = 70 determined on a thermalized config- 
uration at 16 3 x 48, /3 = 5.2, Nf = 2, k = 0.1350, 
c sw = 2.02. Similar behavior is observed for different 
s and yVpoiy. The configuration dependence is negligi- 
ble. 



Simulation 


H 


L 


K 


0.1340 


0.1350 


Mps/My 


~0.8 


~0.7 



Table 1: Simulation parameters. 



We employ the plaquette Wilson gauge action for Sg- 
Two quark masses are studied at j8 = 5.2 on a 16 3 x 48 lat- 
tice for Nf = 2 and c sw = 2.02 (see Table [l|). The simu- 
lations are denoted as H (heavy quark mass) and L (lighter 
quark mass). Table || and |3| present the simulation results for 
the norm of MD force for each sector and simulation statis- 
tics. The trajectory length z is set to unity, and No (Ni) is the 

number of time steps in for UV scale (IR scale) in a single trajectory. For comparison we tabu- 
late the PHMC algorithm and the symmetrically even/odd-site preconditioned HMC (SHMC) in 
the tables, s = PHMC is equivalent to s = of UV-PHMC. The definition of the force norm is 
\ F \ = Hr,,^ Tr [ F ti{n)F^(n)]/2/L 3 /T, where L = 16, T = 48, and F^n) is the MD force to drive 
gauge link U^{n). Pumc is the HMC Metropolis test acceptance rate, and Pgmp the global noisy 
Metropolis test acceptance rate. We monitored the averaged number of matrix vector multiplication 
of M ee to move forward the algorithm by one trajectory ("Mult/traj" in the tables). 

Without UV-filtering the MD force from pseudo-fermion (gauge) action \Fq\ (\Fq\) is about 
1.4 (4.5). The contribution from \F m \ and |F c i v | is smaller than that from |Fq| and \Fq\. We observe 
that \Fq\ depends on the UV-filter parameter s and takes its minimum value at s = 1. The reduction 
of \Fq\ from s = to s = 1 is about a factor three for both (L) and (H) lattices. Exploring No and Ni 
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traj. 


l-Pclv 


\Fg\ 






Prmc 


Pgmp 


Mult/traj 


[1,20,6] 


1.1 


1500 


0.282888(14) 


4.51815(22) 


0.493642(97) 


0.0455531(74) 


0.742(14) 


0.894(10) 


5587(13) 


[1,25,4] 


1.1 


1000 


0.28329(40) 


4.504(16) 


0.49404(89) 


0.04577(24) 


0.847(19) 


0.868(13) 


6679(17) 


[1.56,0] 


PHMC 


800 


0.282814(16) 


4.52032(34) 


1.33903(28) 




0.780(24) 


0.895(17) 


12456(15) 


Table 2: Simulation statistics with iVp iy = 80 (H) parameter. 




.V 


traj. 


|fclv| 


l*b| 


I*qI 


1^*11 V 


Phmc 


A3MP 


Mult/traj 


[1,25,8] 


0.0 


10 


0.286702(53) 


4.53781(27) 


1.490(25) 










[1,25,8] 


0.5 


10 


0.286761(40) 


4.53801(23) 


0.8285(16) 


0.0211877(54) 








[1,25,8] 


1.0 


10 


0.286812(50) 


4.53933(43) 


0.52395(64) 


0.042375(16) 








[1,25,8] 




1000 


0.2867765(90) 


4.53843(17) 


0.52682(26) 


0.0466221(59) 


0.669(17) 


0.863(16) 


12607(26) 


[1,25,4] 




1000 


0.286783(11) 


4.53845(30) 


0.52650(23) 


0.0466247(85) 


0.692(18) 


0.868(16) 


12640(29) 


[1,25,2] 




1000 


0.286798(11) 


4.53813(18) 


0.52763(29) 


0.0466381(66) 


0.664(17) 


0.902(11) 


12596(29) 


[1.25,0] 




1200 


0.286841(16) 


4.53814(36) 


0.52693(41) 


0.0466632(92) 


0.274(29) 


0.882(21) 


11992(45) 


[1,25,4]* 




900 


0.286805(22) 


4.53784(63) 


0.52736(29) 


0.046644(20) 


0.636(13) 


0.891(14) 


12958(35) 


[1,25,2]* 




1000 


0.286790(13) 


4.53858(34) 


0.52609(43) 


0.046627(10) 


0.531(26) 


0.864(17) 


12816(45) 


[1,25,8] 


1.5 


10 


0.286751(28) 


4.53789(24) 


0.74044(39) 


0.063577(17) 








[1,25,8] 


2.0 


10 


0.286861(33) 


4.53787(27) 


1.25248(53) 


0.084818(31) 








[1,70,0] 


PHMC 


1100 


0.286793(13) 


4.53816(29) 


1.39445(57) 




0.762(16) 


0.871(14) 


30385(20) 


[1,70,0] 


SHMC 


340 


0.286771(16) 


4.53909(21) 


1.39161(48) 




0.80(3) 




37491(166) 



Table 3: Simulation statistics with N po i y = 160 (L) parameter. [*: the PUP-order MD integrator is used.] 



[i,Ni,N ] 


s traj. 


l^clv I^G 


I*qI 


l^uv] 


Phmc 


Pgmp 


Mult/traj 


[1,25,4] 


1.1 1000 


0.2867632(93) 4.53873(20) 


0.363581(73) 


0.0466130(72) 


0.906(12) 


0.937(10) 
0.943(11) 


18896(110) 



Table 4: Simulation statistics with Nf = 1 + 1, A^ po iy = 160 (L) parameter. 



by keeping the HMC acceptance rate Phmc around 0.7, we get the computational cost reduction in 
Mult/traj by a factor two at s = 1 for both (L) and (H) lattices. Comparing the efficiency between the 
PUP-order and the UPU-order schemes at the (L) parameter, a small gain in the HMC acceptance 
is observed for the UPU-order scheme. 

Table || shows the result with Nf ■ = 1 + 1 simulation. The action contains two pseudo-fermions 
where one pseudo-fermion represents single flavor. The force norm \Fq | contains the force from 
both pseudo-fermions. As observed in Ref. [ pj ] the HMC acceptance rate becomes better than that 
with the Nf = 2 single pseudo-fermion simulation. The reason of the improvement using multiple 



pseudo-fermion is explained in Refs. [ |12| , |24fl . 

4. Summary and outlook 

In this work we have presented the effectiveness of the UV-filtering preconditioner to the 
Polynomial Hybrid Monte Carlo algorithm. The simulations have been carried out on lattices with 
moderate size and moderate quark masses. The UV-filtering preconditioner reduces the magnitude 
of MD force of the pseudo-fermion part and enables us to extend the MD time-step size of the 
pseudo-fermion. The gain in computational cost is a factor two on the lattices we have investigated. 
We have also tested the Nf = 1 + 1 case to confirm the efficiency of the single flavor algorithm. 
The UV-filtering for the Wilson type fermions is applicable to the heavy mass preconditioner by 
Hasenbusch [^] and the polynomial filtering and further speed up is expected. We are planning 
to apply the UV-filtered PHMC algorithm to the single flavor part of Nf ■ = 2 + 1 simulations. 



The simulation has been carried out on Hitachi SRI 1000 at Information Media Center of Hi- 
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